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Cysteine protease enzymes are important for human physiology and catalyze key protein 
degradation pathways. These enzymes react via a nucleophilic reaction mechanism that 
involves a cysteine residue and the proton of a proximal histidine. Particularly efficient 
inhibitors of these enzymes are nitrile-based, however, the details of the catalytic reaction 
mechanism currently are poorly understood. To gain further insight into the inhibition of 
these molecules, we have performed a combined density functional theory and quantum 
mechanics/molecular mechanics study on the reaction of a nitrile-based inhibitor with the 
enzyme active site amino acids. We show here that small perturbations to the inhibitor 
structure can have dramatic effects on the catalysis and inhibition processes. Thus, we 
investigated a range of inhibitor templates and show that specific structural changes 
reduce the inhibitory efficiency by several orders of magnitude. Moreover, as the reaction 
takes place on a polar surface, we find strong differences between the DFT and QM/MM 
calculated energetics. In particular, the DFT model led to dramatic distortions from the 
starting structure and the convergence to a structure that would not fit the enzyme 
active site. In the subsequent QM/MM study we investigated the use of mechanical vs. 
electronic embedding on the kinetics, thermodynamics and geometries along the reaction 
mechanism. We find minor effects on the kinetics of the reaction but large geometric and 
thermodynamics differences as a result of inclusion of electronic embedding corrections. 
The work here highlights the importance of model choice in the investigation of this 
biochemical reaction mechanism. 
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INTRODUCTION 

Proteases are important enzymes in human physiology and 
involved in terminal protein degradation via a nucleophilic acid- 
base reaction (Chapman et al., 1997; Otto and Schirmeister, 
1997; Madala et al., 2010). They are classified within four dif- 
ferent classes that are based on the Lewis base in the catalytic 
reaction mechanism, namely serine-, cysteine-, aspartate-, and 
metalloproteases. The former two uses the terminal side chain 
of a serine or cysteine amino acid as nucleophilic center in the 
reaction mechanism and are the most abundant proteases in the 
body. Members of the cysteine protease family include intracellu- 
lar proteases called cathepsins that take part in hydrolytic cleavage 
of peptide bonds, apoptosis as well as immune responses. As 
such, the pathogenesis of microbial infections, arthritis, osteo- 
porosis and cancer require the involvement of cysteine proteases, 
which makes them an important target of drug development 
and chemotherapy (Selzer et al., 1999; Palermo and Joyce, 2007; 
Robinson et al, 2008). In addition, they have been implicated 
in atherosclerosis-based vascular diseases (Lutgens et al, 2007; 
Cheng et al, 201 1; Ratovitski et al, 201 1). 

Due to their pharmaceutical interest considerable research has 
been devoted to establish the mechanism and function of cys- 
teine protease enzymes (Kato et al, 2005; Hang et al, 2006; 
Shokhen et al., 201 1). Furthermore, research into finding suitable 



inhibitors of the cysteine protease family is paramount (Arkin 
and Wells, 2004; Rzychon et al, 2004; Drahl et al, 2005; Tyndall 
et al., 2005; Ehmke et al., 2011). Inhibition can take place via 
several different methods, including covalent inhibition, block- 
age or distortion of the catalytic active site via competition 
with non-covalent inhibitors. One particular class of covalent 
inhibitors that have been particularly useful for cysteine pro- 
teases are nitrile-based compounds (Oballa et al, 2007; MacFaul 
et al, 2009; Morley et al, 2009; Ehmke et al, 2012). Currently, 
however, the precise mechanism of the interaction between these 
inhibitors and active site cysteine is not clear. In order to under- 
stand the inhibition process by cysteine proteases, and, especially, 
its mechanism with inhibitors we decided to do a computational 
study. The calculations use density functional theory (DFT) on 
model complexes and quantum mechanics/molecular mechanics 
(QM/MM) studies on a complete cysteine protease enzyme. 

Cysteine proteases bind their substrates in a groove located on 
the surface of the enzyme, therefore the reaction takes place on 
a relatively polar region with many interacting hydrogen bond- 
ing donor and acceptor groups (Sajid and McKerrow, 2002). 
A protein-ligand structure of cathepsin K (Altmann et al., 2004) 
is shown in Figure 1, whereby the nitrile-based inhibitor is cova- 
lently linked to a cysteine residue (Cys2s) in the active site of 
the protease and within hydrogen bonding distance to a histidine 
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FIGURE 1 | Extract of the 1U9V pdb structure of cathepsin K with key 
amino acids and inhibitor highlighted. 



residue (His 162)- It has been hypothesized that this histidine 
residue donates a proton during the covalent linkage forma- 
tion between inhibitor and the cysteine residue, but the intricate 
details of this mechanism remain elusive. The computational 
study described in this work will address this mechanism and 
establish the key factors that drive the reaction mechanism as well 
as the influence of the surrounding protein. 

Several groups have studied the catalytic mechanism of cys- 
teine protease with computational techniques, including quan- 
tum mechanics/molecular mechanics (QM/MM) (Helten et al., 
2004; Ma et al, 2007; Mladenovic et al, 2008a,b,c) and den- 
sity functional theory (DFT) model complexes and focused on 
cathepsin B proteases (Shokhen et al, 2009; Shankar et al, 2010). 
These studies implicated strong influences of the hydrogen bond- 
ing network in the substrate/inhibitor binding pocket and led to 
increased stability of the HisH + — Cys - ion pair as compared to 
its neutral form, although this may change after substrate bind- 
ing. QM/MM studies on the catalytic mechanism of these cysteine 
proteases revealed a rate determining acylation step with a barrier 
of 19.8 kcal mol _1 (Ma et al., 2007). These values compare well 
with other nucleophilic addition reactions in related enzymes, 
such as sortase A that has a similar active site as cysteine protease 
and reacts with a similar catalytic machinery (Tian and Eriksson, 
2011). So far, few studies have addressed inhibitor binding and 
the effect of compound reactivity and in particular nitrile-based 
inhibitors have not been thoroughly studied computationally in 
this way (Mladenovic et al., 2008c). We therefore initiated this 
study to understand the covalent mechanism of this chemotype 
through its binding to cathepsin K. 

The specific mechanism of how nitrile-based inhibitors inter- 
act with cathepsin K is not known, it is however clear that these 
inhibitors bind tightly within the active site and inhibit via the 
reversible formation of a covalent bond (Helten et al, 2004). To 
find out, how structural changes to these inhibitors affect their 
inhibition, we modeled the reaction of cathepsin K with four 
inhibitor scaffolds (Scheme 1): SI and S2 are the protonated and 
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SCHEME 1 I Nitrile-based inhibitor templates investigated in this 
study. 



neutral purine-based nitriles, whereas in S3 and S4 the nitrogen 
atoms of the six-membered ring were replaced by carbon. The 
purpose of investigating S3 and S4 is because studies have shown 
such a change results in a significant reduction in the reactivity of 
the nitrile functionality (Altmann et al., 2004). As the reaction 
takes place on the surface of the enzyme, and the interactions 
between the inhibitors and protein affect the reaction we have 
opted for two sets of calculations. In the first set we take a model of 
the substrate binding- site and reaction position and calculate the 
mechanism with density functional theory. In a subsequent set 
of calculations we use quantum mechanics/molecular mechan- 
ics (QM/MM) on the complete enzyme. As we show here, there 
are dramatic differences in substrate activation between the two 
models that highlight the importance of the full QM/MM model 
in the calculations. 

METHODS 

The studies presented in this work use density functional the- 
ory (DFT) as well as quantum mechanics/molecular mechanics 
(QM/MM) methods as implemented in the Jaguar and Gaussian- 
03 program packages (Frisch et al., 2004; Jaguar, 2009). We used 
two approaches to investigate the mechanism of cysteine protease 
inhibition: (i) DFT calculations on a model complex that repre- 
sents the active species and key hydrogen bonding interactions, 
so-called QM-cluster calculations, and (ii) Full QM/MM on the 
complete enzyme. In the past we used the QM-cluster methodol- 
ogy extensively to describe the mechanistic features of important 
enzymes for human health, for example of the cytochromes 
P450, (Shaik et al, 2005; de Visser, 2010; Kumar et al, 2010) 
cysteine dioxygenase (Aluri and de Visser, 2007) and taurine/a- 
ketoglutarate dependent dioxygenase (de Visser, 2006). Although 
these QM-cluster calculations in many cases were able to repro- 
duce experimentally determined product distributions, kinetic 
isotope effects and rate constants as well as spectroscopic features 
of key stable intermediates (Kumar et al., 2005; Vardhaman et al., 
2011, 2013a,b), we cannot be certain that the methodology will 
also work on an enzyme such as a cysteine protease. In partic- 
ular, QM/MM studies on these enzymatic systems showed that 
in many cases the active features were sufficient to describe the 
enzyme accurately (Godfrey et al., 2008; Porro et al., 2009; Kumar 
et al, 2011b), but in cases where the active species has close 
lying electronic states environmental perturbations were shown 
to change the electronic properties of the reactant and conse- 
quently the reactivity patterns (Ogliaro et al., 2000; de Visser et al., 
2002; Leeladee et al, 2012). We anticipated similar problems in 
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SCHEME 2 | DFT and QM region of the QM/MM model of cysteine 
protease used in this study. Atoms marked with a star were kept fixed in 
the DFT calculations. 



the current system, where strong polar interactions influenced 
the reaction kinetics, therefore, a combined DFT and QM/MM 
approach was applied. 

QM-CLUSTER CALCULATIONS 

The DFT model complex is based on the key features of the 
active site of cysteine protease with inhibitor bound as displayed 
in Scheme 2. Since, the inhibitor is bound on the surface of 
the enzyme via a range of hydrogen bonding interactions, we 
included a large proportion of the active site including these key 
interacting residues. Important hydrogen bonding interactions of 
the protein with these inhibitor templates, include the interac- 
tion with the cysteine residue, the alcohol group of Serig3, the 
peptide chain carbonyl group of Alai6i, Alai63 and Asni82> and 
a bridging water molecule (W402). To prevent unnatural changes 
in the geometry during the optimization procedure, we fixed sev- 
eral carbon atoms of the peptide chains as identified with a star in 
Scheme 2. However, as geometric constraints often lead to geom- 
etry convergence problems and small imaginary frequencies, we 
kept the number of fixed atoms to a minimum here (Pratter et al., 
2013; de Visser et al., 2014). This model has overall charge of + 1 
and was calculated in the singlet spin state only. Note that the 
structural constraint had little effect on the geometry optimiza- 
tion and all local minima had real frequencies only. The inhibition 
of nitriles by cysteine protease was investigated with four differ- 
ent inhibitor molecules as identified with SI, S2, S3 and S4 in 
Scheme 1 above. SI and S2 are based on the inhibitor bound 
in the 1U9V pdb file and differ by the protonation state of the 
nitrogen atom of the five-membered ring. As a consequence the 
model with substrate SI has overall charge +1, whereas the model 
with substrate S2 is neutral. In addition, we investigated two sub- 
strates S3 and S4, where one of the nitrogen atoms of the pyridine 
ring was replaced by a CH group. These structures are expected 
to have a lower intrinsic reactivity and differ in their hydrogen 
bond acceptor patterns. It has been shown (Ehmke et al., 2012) 
that molecules with high electrophilicity inhibit cysteine protease 
better than those with lower electrophilicity. In particular, com- 
pounds, such as S2, which contain a pyrimidine, were found to 
inhibit more effectively than compounds based on the S3 and S4 
templates that use a pyridine instead. 

Following previous experience with DFT on nucleophilic addi- 
tion reactions (de Visser, 2012), we use the hybrid density 
functional method B3LYP (Becke, 1993; Lee et al, 1998) in com- 
bination with a 6-3 1G* basis set, BS1, for geometry optimizations 
and frequency calculations. We performed a full geometry opti- 
mization followed by an analytical frequency. To test the effect 
of the basis set on the energetics we ran single point B3LYP/6- 
31 1++G** calculations, basis set BS2. For a selection of reactions 
we also tested the effect of solvent using the polarized contin- 
uum model as implemented in Jaguar with a dielectric constant 
of € = 5.7 and a probe radius of 2.66 A. 

QM/MM SET-UP 

In a second set of calculations we carried out QM/MM calcu- 
lations starting from the inhibitor bound structure of cathepsin 
K: 1U9V pdb structure (Altmann et al, 2004). We used well 
tested QM/MM procedures, which we applied to the catalytic 



reaction mechanisms of heme and non-heme iron enzymes pre- 
viously (Godfrey et al, 2008; Porro et al, 2009; Kumar et al, 
2011b; Quesne et al., 2014). The work started out from the 
inhibitor-bound enzyme monomer as deposited as the 1U9V 
crystal structure, which we initially updated to include all missing 
heavy atoms of the amino acids using the MOE program package 
(MOE, 2008). Subsequently, we used these starting coordinates to 
add hydrogen atoms to the heavy atoms with the PDB2PQR pro- 
gram package (Dolinsky et al, 2007). All amino acids were then 
protonated according to the usual pK a conventions (Dolinsky 
et al., 2007) at pH = 7 and analyzed with the Propka program 
package; this resulted in a structure with all aspartate and gluta- 
mate amino acids in their deprotonated forms and all arginine 
and lysine amino acids as protonated. The peptide chain con- 
tains two histidine amino acids and upon visual inspection it was 
decided to take His 1 62 as doubly protonated, whereas His 177 as 
singly protonated on atom Ne only. This gave us a system that 
is overall charge neutral. Subsequently, our chemical system was 
solvated in a sphere where the protein was extended with a water 
layer with a diameter of 10 A using periodic boundary conditions. 
We repeated the solvation procedure in several steps where we 
attempted to add further water molecules to the system after each 
step and kept repeating this process until few water molecules 
could be added. This resulted in a chemical system with a total 
of 9580 atoms. Note, that the structure is energy minimized after 
each water addition step until a threshold of minimum number of 
water molecules added was reached. This system was then equi- 
librated and subjected to a molecular mechanics minimization 
using the FF94 force field (Wang et al, 2004) and then gradu- 
ally heated to above room temperature (298 K) conditions using 
a heating procedure. During the equilibration the water acces- 
sible areas of the protein were comprehensibly solvated and the 
protein unfolded into a more relaxed structure using a dynamical 
protocol built into the MOE software package (MOE, 2008). The 
equilibration was performed in a consecutive set of minimiza- 
tions starting with restrained protein and ligand and followed by 
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an fully unrestrained calculation. Subsequently, the chemical sys- 
tem was gradually heated from 0 to 300 K with restrained ligand 
and protein. Finally, an NVT molecular dynamics simulation was 
done for 500 ps in steps of 0.001 ps until a temperature of 300 K 
was reached. In order to parameterize and obtain a topology for 
our purine-based nitrile inhibitor the antechamber component 
of the AMBER 10 package was used (Cornell et al, 1995; Tao and 
Schlegel, 2010). In this particular case, for each atom and bond 
type specific topology and parameters were identified based on 
known residue values. 

QM/MM CALCULATIONS 

We then selected a QM region that contained the same model 
as displayed in Scheme 2 for the DFT model calculations. The 
QM region was overall charge neutral and was calculated in 
the closed- shell singlet spin state only. The ONIOM approach 
(Vreven et al., 2006). was used for the QM/MM calculations with 
the B3LYP/6-311G method describing the QM region. We tested 
several force fields for the description of the MM region including 
the UFF/ZDO and Amber methods (Rappe et al, 1992; Cornell 
et al, 1995). The link-atom approach (Singh and Kollman, 1986) 
describes the division between the QM and MM regions as 
hydrogen link-atoms. All structures and energies reported here 
were fully optimized (without constraints) using DFT/ AMBER in 
Gaussian-03, whereby all atoms within a radius of 8A from the 
reaction center were free to move during the optimization and 
the rest was fixed in its position throughout the QM/MM cal- 
culation. As frequency calculations were impossible on the full 
QM/MM structure in Gaussian, we extracted the QM region 
from the optimized QM/MM geometries, added hydrogen atoms 
and ran an analytical frequency calculation on the QM region 
only. To prevent excessive small imaginary frequencies in struc- 
tures, we reoptimized the hydrogen atoms before running the 
frequency calculation, however, none of the local minima dis- 
cussed here had any imaginary frequencies. As will be shown 
in the results section the differences in AE and AE+ZPE ener- 
gies are small, and do not change the trends reported in this 
work. The QM/MM energies (£qm/mm) were calculated using 
Equation 1 as the separate energies of a QM calculation on the 
QM region (£ mo del, qm) and an MM calculation on the complete 
system (£ rea l, mm) minus an MM calculation on the QM region 
(£model,MM)- QM/MM optimizations were initially done using 
DFT/ AMBER with mechanical embedding included and were fol- 
lowed by DFT/AMBER single point calculations with electronic 
embedding included. We later also did full geometry optimiza- 
tions with electronic embedding included at the DFT/AMBER 
level of theory. 

^QM/MM = ^rnodel, QM + ^real, MM ~ ^model, MM (1) 
MECHANISTIC STUDIES 

The covalent inhibition of the inhibitor molecules was investi- 
gated via two possible scenarios, namely a concerted or a stepwise 
mechanism, Scheme 3. The first possibility is sequential nucle- 
ophilic addition and proton transfer in a stepwise mechanism, 
where the thiolate group of Cys25 attacks the carbon atom of 
the nitrile group first to form a covalent bond. Subsequently, the 



negatively charged nitrogen atom of this bond abstracts a proton 
from Hisi62- Alternatively the nucleophilic addition and proton 
transfer happen in a single concerted reaction step. To ascer- 
tain the reaction mechanism of these inhibitors against cysteine 
protease enzymes we investigated both possible reaction mecha- 
nisms using two computational methods: Firstly, we did extensive 
density functional theory (DFT) studies on an active site model 
complex. Secondly, full quantum mechanics/molecular mechan- 
ics (QM/MM) studies on a substrate bound crystal structure were 
performed. 

RESULTS AND DISCUSSION 

Thus, in the proposed reaction mechanism, the thiolate group of 
Cys25 reacts via a nucleophilic addition reaction with substrate 
to form a covalent linkage. The, thereby formed intermediate 
abstracts a proton from a nearby histidine residue (Hisi62) to give 
the product complex, Scheme 2. Currently, it is unclear whether 
this mechanism is concerted or stepwise and what the effect of the 
protein is on the reaction mechanism. 

We started the work with a DFT investigation into the catalytic 
reaction mechanism of cysteine proteases, using QM-cluster 
model as described in Scheme 2 above and the results obtained 
with templates SI, S2, S3 and S4 are given in Figure 2. We define 
the reaction as proceeding from a reactant complex (R) that is 
initiated by an electrophilic attack of the thiolate group on the 
nitrile via a transition state TSe leading to an intermediate state 
Ie. In a subsequent proton transfer reaction via transition state 
TSr we form the product complex R Thus, using a substrate 
with the imidazolium group doubly protonated we find a step- 
wise mechanism, but if it is deprotonated on the position a 
concerted pathway is found via a single transition state (TSe), 
where the imaginary mode gives simultaneous S-C bond forma- 
tion and proton transfer from His 1 62 to the nitrile nitrogen atom 
to form the product complex R By contrast, for substrate SI the 
reaction is stepwise via an intermediate complex and two barriers 
TSe and TSh separating the intermediate were characterized. As 
a consequence the processes for S2, S3 and S4 are concerted via 
a single transition state with an imaginary frequency that shows 
a mixture of both electrophilic attack and proton transfer. For 
inhibitor scaffolds SI, S2, and S3 this barrier is characterized by 
a small imaginary frequency, which is typical for sulfur- involved 
transition states (Kumar et al, 201 la). Using S4, the proton trans- 
fer is more dominating in the transition state, likely due to the 
later transition state and hence the imaginary frequency is larger 
(z^O.ycm- 1 ). 

Optimized geometries of the transition states (TSe and TSh) 
are displayed alongside the mechanism of the reaction in Figure 2. 
The concerted TSe transition states have the proton originating 
from Hisi62 at a relatively short distance of 1.456-1.542 A from 
the nitrile nitrogen atom, where it is 1.746 A in the stepwise TS 
for substrate SI. Interestingly, the nitrile C-N distance shows little 
variation between the four different transition states and ranges 
from 1.196 to 1.207 A. The C-S distance is slightly longer in the 
stepwise transition state as compared to the concerted transition 
states. 

Energetically, the nucleophilic mechanisms are character- 
ized by low energy barriers of between 5 and 6kcal mol -1 
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SCHEME 3 | Proposed reaction mechanisms of cysteine protease with nitrile-based inhibitors. 





Substrates: SI (S2) {S3} [S4] 



FIGURE 2 | QM-cluster calculations on the reaction mechanism of 
inhibition of substrate by cysteine protease. In blue we give the stepwise 
mechanism and in red the concerted mechanism. Bond lengths are given in 



angstroms, the imaginary frequency in the transition states in cm -1 and 
relative energies {AE+ZPE with energies calculated with basis set B2) in kcal 
mol -1 . 



for templates SI, S2, and S4, whereas S3 gives a somewhat 
enhanced barrier of 12.7 kcal mol -1 in the gas-phase. Therefore, 
the inhibition of cathepsin K by these nitrile-based inhibitors 
would be expected to be a fast and efficient process. Although 
technically the mechanism using substrate SI is stepwise with 
well- characterized transition states for nucleophilic addition and 
proton transfer, the proton transfer barrier is very small (of 
the order of 0.1 kcal mol -1 ). This implies that the intermedi- 
ate complex will have an ultrashort lifetime and proton transfer 
will happen instantaneously prior after the nucleophilic addi- 
tion. Consequently, the mechanism perceived will be a concerted 
nucleophilic addition and proton transfer. Calculations with a 
dielectric continuum added that mimics a solvent gave little 
changes in the relative energies. 

Replacement of one of the nitrogen atoms of the pyridine 
ring in S2 by a C-H group as in S3 leads to a raise in barriers 
for the nucleophilic reaction mechanism. Thus, using substrates 
SI, S2, and S4 the peptide chain N-H group of Cys25 interacts 
with the pyridine nitrogen atom in a weak hydrogen bonding 



interaction. However, replacement of this nitrogen atom by a C-H 
group leads to a repulsive interaction between the N-H and C-H 
groups, which leads to some tilting of the substrate group and 
weakening of the hydrogen bonding interactions between sub- 
strate and protein. Because of this the barriers are raised and 
unfavorable substrate inhibition is observed. This result is in 
good agreement with experimental inhibition studies that found 
higher activity for S2 type inhibitors than for pyridine containing 
inhibitors (Ehmke et al., 2012). The effect of substitution of the 
other pyridine nitrogen atom by C-H has little influence on the 
reaction mechanism and thermodynamics since it is not involved 
in direct hydrogen bonding interactions. Moreover, this group is 
located on the edge of the protein and points toward the solvent, 
where its thermodynamic perturbation will be limited. As such it 
may be difficult to establish the intricate details of the reactivity 
differences between S3 and S4 experimentally. 

To confirm the suggested mechanism and ascertain the effect 
of the protein, we decided to do a full QM/MM calculation on the 
mechanism using substrate S2. The cysteine protease active site is 
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located on the surface of the protein, Figure 3, and the inhibitor 
forms a number of hydrogen bonds with the protein to maintain 
its binding mode. Inside the active site of the protein, the nitrile 
functionality reacts with Cys25, assisted by Hisi62- Visualization of 
the protein structure identifies a covalent linkage between Cys25 
and the substrate, hence is in a product-like P conformation. 

To understand the effect of protein and solvent on the mecha- 
nism of these inhibitors we decided to do a full QM/MM study 
starting from the described crystal structure. We initially opti- 
mized the reactant (R) and product (P) geometries using our 
QM/MM model and extracts of these structures are given in 
Figure 4. We used two procedures for the geometry optimiza- 
tions: (i) with mechanical embedding included and (ii) with 
electronic embedding included. Thus, it occasionally happens 
that structures and consequently relative energies are dependent 
on the full environmental perturbations of the whole protein on 
the active site and with electronic embedding these interactions 
are included in full. The reactant complex has the inhibitor in a 
conformation where the nitrogen atom is in hydrogen bonding 
distance to the proton of the imidazole group of Hisi62 at a rel- 
atively short distance of 1.519 A. Hence the inhibitor molecule 
will be strongly bound in the substrate binding pocket in a spe- 
cific orientation. The sulfur atom of the cysteine residue is located 



at the relatively large distance of 2.166 A to the carbon atom 
of the nitrile. This geometry changes somewhat when electronic 
embedding is included and the histidine moves to a position 
in hydrogen bonding distance to the thiolate of Cys25, which 
results in somewhat larger distance away from the substrate. In 
the mechanically embedded structure the thiolate also forms a 
hydrogen bonding interaction with the peptide backbone N-H of 
Alai6i (2.488 A) as well as a weak hydrogen bond with the hydro- 
gen atom of the H-C a atoms of Hisi62 at a distance of 2.636 A. In 
the product complexes these hydrogen bonding interactions are 
lost and the peptide chain containing the Alai6i-Hisi62 residues 
has moved away from the Cys-inhibitor linkage. Thus the nitro- 
gen atom of the imidazole group of His 1 62 has formed a weak 
hydrogen bonding interaction with one of the hydrogen atoms of 
the CH2 group of Cys25 instead (distance of 2.227 A). By con- 
trast, the thioester group of Cys25 interacts with the H-C a group 
of Hisi62 (distance of 2.830 A) in the product complexes, and 
consequently considerable changes in the protein have occurred 
during the reaction process. As a result of this the relative energies 
for the potential energy profile has undergone some drastic 
changes. 

The calculated reaction energy for the process from Rqm/mm 
to Pqm/mm using mechanical embedding only is — 20.0kcal 
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FIGURE 3 I QM/MM model and QM/MM active site as based on the 1U9V cathepsin K structure. 




FIGURE 4 | Optimized geometries with bond lengths in angstroms of reactant and product geometries for nitrile-based inhibitors of cathepsin K as 
calculated with QM(B3LYP/6-311G)/MM. Data in square brackets were obtained with electronic embedding included. 



Frontiers in Chemistry | Theoretical and Computational Chemistry 



December 2013 | Volume 1 | Article 39 | 6 



Quesne et al. 



Cysteine protease inhibition 



mol -1 , which in comparison to the QM-cluster calculations 
described above, is indicative of an irreversible process. However, 
it is known from experiment that the reaction is reversible, which 
means that the energeties calculated with mechanical embed- 
ding are not realistic. In particular, enzymatic studies on this 
system have shown that covalent binding of the inhibitor is a 
reversible process. The calculations with mechanical embedding 
included, therefore, are a poor representation of the actual inhi- 
bition process. To find a more realistic pathway, we subsequently 
recalculated the QM/MM calculations with electronic embedding 
included. As will be shown in the next paragraphs these calcula- 
tions predict an almost thermoneutral reaction mechanism and 
consequently, a reversible reaction mechanism in agreement with 
experiment. Most probably the polarity of the inhibitor binding 
site and the enzyme makes it essential to include full electronic 
embedding in the calculations. 

In a subsequent set of calculations we explored the poten- 
tial energy profile between reactants and products in detail using 
QM/MM with both mechanical and electronic embedding mod- 
els. To this end we initially did a full geometry scan, whereby the 
C-S bond formation was kept fixed at specific distances while we 
optimized all other degrees of freedom starting from Rqm/mm- In 
addition, we did geometry scans in the reverse direction starting 
from Pqm/mm for the proton transfer back from the CN-H group 
to Hisi62- These geometry scans gave insight onto the potential 



energy profile of the reaction and the maxima from the scans 
were used as starting points for the transition state searches. Our 
optimized transition state geometries and potential energy land- 
scapes are shown in Figure 5. The barrier is characterized by a 
large imaginary frequency of z'589.6 cm -1 that implicates the pro- 
ton transfer from His 1 62 to the nitrile group. It is also substantially 
higher in energy than the one calculated for the DFT model above 
and we find it 23.1 kcal mol -1 in energy above the reactants using 
the mechanical embedding procedures. Geometrically, the C-S 
distance is in between that found for reactants and products. 
Moreover, an attempt to optimize a local minimum and a sub- 
sequent C-S bond formation transition state failed and led to 
reactant structures in all cases. Therefore, the reaction proceeds 
in a concerted mechanism via a transition state that is domi- 
nated by the proton transfer motion. Using electronic embedding 
corrections the barrier height drops slightly to 19.9 kcal mol -1 . 
Therefore, electronic embedding has a small barrier-lowering 
effect on the calculations. The nucleophilic addition barrier cal- 
culated with QM/MM is in perfect agreement with that found for 
native cysteine proteases, for which a value of 19.8 kcal mol -1 was 
calculated (Ma et al, 2007). Therefore, the electronic embedding 
corrections appear to have a large effect on the overall thermody- 
namics of the reaction but only a minor effect on the kinetics. 
The electronically embedded reaction energy is close to ther- 
moneutral and very similar to that found for the QM-cluster 




FIGURE 5 | (A) Optimized geometry with bond lengths in angstroms of 
TSqm/mm for nitrile-based inhibitors in cathepsin K as calculated with 
QM(B3LYP/6-311G)/MM. (B) Potential energy landscape from reactants to 



products as obtained with QM/MM. Energies are in kcal mol -1 and contain 
ZPE corrections. Values in square brackets were obtained with electronic 
embedding included. 







FIGURE 6 | (A) Overlay of optimized geometries of Rqm/mm (red), TSqm/mm (amber) and Pqm/mm (blue) structures. (B) Overlay of TSqm/mm (amber) and 
TSe,si (violet). QM/MM structures obtained with mechanical embedding only. 
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results described above. Experimentally, it has been shown that 
the reaction is reversible, which implicates an almost thermoneu- 
tral reaction mechanism, in agreement with our QM-cluster 
and QM/MM with electronic embedding results. Clearly, the 
mechanical embedding QM/MM calculations overestimate the 
overall thermodynamics of the reaction, although the kinetics is 
reasonable. 

In order to understand the large differences and reaction ener- 
getics between the DFT model study and the QM/MM results, we 
investigated structural differences between the optimized geome- 
tries first, see Figure 6. Panel Figure 6A displays an overlay of the 
three QM/MM optimized geometries. Despite the fact that these 
structures have resulted from a full geometry optimization with- 
out constraints, actually the peptide chain is virtually in the same 
position for all geometries. In the active site, there are some obvi- 
ous structural differences due to formation of a linkage between 
Cys25 and the inhibitor, which is absent in the reactant. This bond 
formation changes the position of the histidine side-chain as well 
as other proton donors/hydrogen bonding groups to the active 
site, including the Trpi84 and Glni9 residues. 

The overlay of the two transition state structures for sub- 
strate SI activation as obtained by DFT and QM/MM is given in 
Figure 6B. Even though the DFT model complex had four fixed 
carbon atoms in positions of the original protein structure, the 
full geometry optimization of the reaction mechanism led to a 
structure that significantly deviated from the actual enzyme con- 
formation. Thus, the overlay of TSqm/mm and TSe, si shows a 
twist of the substrate by almost 90° as well as reorientation of 
the protein backbone. In the model complex, there is sufficient 
geometric flexibility so that the system will relax to its most ideal 
configuration. In the enzyme, by contrast, steric interactions of 
the protein structure prevent reorientation of the geometry and 
the most ideal configurations as those found for the DFT model 
complexes are not stable. Because of that the energetics of the 
potential energy profile has shifted dramatically and the nucle- 
ophilic addition step is raised in energy. This is indeed what was 
found on native cysteine proteases, where it was shown that dis- 
ruption of the hydrogen-bonding network led to proton affinity 
differences of the histidine and cysteine groups (Mladenovic et al., 
2008b). Based on these studies Engels et al. predicted enhanced 
nucleophilic addition reactivity upon removal of the hydrogen- 
bonding network. Indeed that is what we find when we compare 
the studies of the DFT model complex (without the water layer) 
and the QM/MM result. Recent work of ours on the reactivity of 
iron(IV)-oxo and manganese ( IV) -oxo intermediates in aliphatic 
reaction mechanisms also showed reduced reactivity for com- 
plexes whereby the active oxidant underwent hydrogen bonding 
(accepting) interactions so that this may be a general feature in 
enzymatic reaction mechanisms (Latifi et al., 2013). 

CONCLUSIONS 

We report a combined DFT and QM/MM study into the mech- 
anism of inhibition of nitrile-based inhibitors to cathepsin K. 
We show here that a DFT model complex gives dramatic differ- 
ences in optimized geometries and relative energies as compared 
to QM/MM. We show that despite geometric constraints the DFT 
structure has undergone dramatic changes with respect to its 



original conformation and does not fit the enzymatic pocket any- 
more when an attempt is made to reinsert the structure into the 
protein. These changes affect the relative energies of the reaction. 
All calculations predict a concerted nucleophilic addition reaction 
with simultaneous C-S bond formation and proton transfer. We 
also did QM/MM calculations and show that a procedure with full 
electronic embedding included is necessary to describe an irre- 
versible reaction process. This again highlights the importance of 
model and method choice in quantum chemical calculations. 
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